Neural network-based method for visual recognition of driver’s voice commands using attention mechanism
Annotation
Visual speech recognition or automated lip-reading systems actively apply to speech-to-text translation. Video data proves to be useful in multimodal speech recognition systems, particularly when using acoustic data is difficult or not available at all. The main purpose of this study is to improve driver command recognition by analyzing visual information to reduce touch interaction with various vehicle systems (multimedia and navigation systems, phone calls, etc.) while driving. We propose a method of automated lip-reading the driver’s speech while driving based on a deep neural network of 3DResNet18 architecture. Using neural network architecture with bi-directional LSTM model and attention mechanism allows achieving higher recognition accuracy with a slight decrease in performance. Two different variants of neural network architectures for visual speech recognition are proposed and investigated. When using the first neural network architecture, the result of voice recognition of the driver was 77.68 %, which was lower by 5.78 % than when using the second one the accuracy of which was 83.46 %. Performance of the system which is determined by a real-time indicator RTF in the case of the first neural network architecture is equal to 0.076, and the second — RTF is 0.183 which is more than two times higher. The proposed method was tested on the data of multimodal corpus RUSAVIC recorded in the car. Results of the study can be used in systems of audio-visual speech recognition which is recommended in high noise conditions, for example, when driving a vehicle. In addition, the analysis performed allows us to choose the optimal neural network model of visual speech recognition for subsequent incorporation into the assistive system based on a mobile device.
Keywords
Постоянный URL
Articles in current issue
- Determination of the action type of hydrate formationinhibitors by their infrared spectra
- Application of Raman spectroscopy to study the inactivation process of bacterial microorganisms
- Numerical study of the effect of methemoglobin concentration in the blood on the absorption of light by human skin.
- Low-temperature cell for IR Fourier spectrometric investigation of hydrocarbon substances
- Peculiarities of growing Ga1–xInxAs solid solutions on GaAs substrates in the field of a temperature gradient through a thin gas zone
- An enhanced AES-GCM based security protocol for securing the IoT communication
- Attacks based on malicious perturbations on image processing systems and defense methods against them
- Brain MRT image super resolution using discrete cosine transform and convolutional neural network
- Text augmentation preserving persona speech style and vocabulary
- Verification of event-driven software systems using the specification language of cooperating automata objects
- Intelligent adaptive testing system
- Brain tumour segmentation in MRI using fuzzy deformable fusion model with Dolphin-SCA
- Optimization of human tracking systems in virtual reality based on a neural network approach
- Errors in the demodulation algorithm with a generated carrier phase introduted by the low-pass filter
- Modeling of the process of spherical form correction for rotors of electrostatically suspended gyros
- Method of spatial multiplexing in multi-antenna communication systems
- Modeling and simulation of heat exchanger with strong dependence of oil viscosity on temperature
- Approach to the generalized parameters formation of the complex technical systems technical condition using neural network structures
- Numerical simulation of gas dynamics during operation of a wide-range rocket nozzle with a porous insert
- The exact solution of a shock wave reflection problem from a wall shielded by a gas suspension layer
- Adaptive observer for state variables of a time-varying nonlinear system with unknown constant parameters and delayed measurements
- RuLegalNER: a new dataset for Russian legal named entities recognition